Ingolstadt
c82836ed448c41094025b4a872c5341e-Supplemental.pdf
Recently there has been significant theoretical progress on understanding the convergence andgeneralization ofgradient-based methods onnonconvexlosses withoverparameterized models. Nevertheless, manyaspectsofoptimization and generalization and in particular the critical role of small random initialization are not fully understood.
- North America > United States > New York > New York County > New York City (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
c82836ed448c41094025b4a872c5341e-Paper.pdf
Recently there has been significant theoretical progress on understanding the convergence andgeneralization ofgradient-based methods onnonconvexlosses withoverparameterized models. Nevertheless, manyaspectsofoptimization and generalization and in particular the critical role of small random initialization are not fully understood.
- North America > United States > New York > New York County > New York City (0.05)
- Asia > Middle East > Jordan (0.05)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
- Research Report (0.68)
- Instructional Material (0.46)
- Health & Medicine (0.68)
- Transportation > Infrastructure & Services (0.50)
- Transportation > Ground > Road (0.50)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Asia > China > Beijing > Beijing (0.04)
Optimization, Generalization and Differential Privacy Bounds for Gradient Descent on Kolmogorov-Arnold Networks
Wang, Puyu, Zhou, Junyu, Liznerski, Philipp, Kloft, Marius
Kolmogorov--Arnold Networks (KANs) have recently emerged as a structured alternative to standard MLPs, yet a principled theory for their training dynamics, generalization, and privacy properties remains limited. In this paper, we analyze gradient descent (GD) for training two-layer KANs and derive general bounds that characterize their training dynamics, generalization, and utility under differential privacy (DP). As a concrete instantiation, we specialize our analysis to logistic loss under an NTK-separable assumption, where we show that polylogarithmic network width suffices for GD to achieve an optimization rate of order $1/T$ and a generalization rate of order $1/n$, with $T$ denoting the number of GD iterations and $n$ the sample size. In the private setting, we characterize the noise required for $(ε,δ)$-DP and obtain a utility bound of order $\sqrt{d}/(nε)$ (with $d$ the input dimension), matching the classical lower bound for general convex Lipschitz problems. Our results imply that polylogarithmic width is not only sufficient but also necessary under differential privacy, revealing a qualitative gap between non-private (sufficiency only) and private (necessity also emerges) training regimes. Experiments further illustrate how these theoretical insights can guide practical choices, including network width selection and early stopping.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Ingolstadt (0.04)
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Zhang, Meishan, Zhang, Xin, Zhao, Xinping, Huang, Shouzheng, Hu, Baotian, Zhang, Min
Text embeddings have attracted growing interest due to their effectiveness across a wide range of natural language processing (NLP) tasks, including retrieval, classification, clustering, bitext mining, and summarization. With the emergence of pretrained language models (PLMs), general-purpose text embeddings (GPTE) have gained significant traction for their ability to produce rich, transferable representations. The general architecture of GPTE typically leverages PLMs to derive dense text representations, which are then optimized through contrastive learning on large-scale pairwise datasets. In this survey, we provide a comprehensive overview of GPTE in the era of PLMs, focusing on the roles PLMs play in driving its development. We first examine the fundamental architecture and describe the basic roles of PLMs in GPTE, i.e., embedding extraction, expressivity enhancement, training strategies, learning objectives, and data construction. We then describe advanced roles enabled by PLMs, including multilingual support, multimodal integration, code understanding, and scenario-specific adaptation. Finally, we highlight potential future research directions that move beyond traditional improvement goals, including ranking integration, safety considerations, bias mitigation, structural information incorporation, and the cognitive extension of embeddings. This survey aims to serve as a valuable reference for both newcomers and established researchers seeking to understand the current state and future potential of GPTE.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Guangdong Province > Shenzhen (0.05)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- (17 more...)
- Overview (1.00)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.45)
- Health & Medicine (1.00)
- Information Technology > Security & Privacy (0.45)
SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models
Han, Jiaojiao, Xu, Wujiang, Jin, Mingyu, Du, Mengnan
Large language models (LLMs) have achieved remarkable progress, yet their internal mechanisms remain largely opaque, posing a significant challenge to their safe and reliable deployment. Sparse autoencoders (SAEs) have emerged as a promising tool for decomposing LLM representations into more interpretable features, but explaining the features captured by SAEs remains a challenging task. In this work, we propose SAGE (SAE AGentic Explainer), an agent-based framework that recasts feature interpretation from a passive, single-pass generation task into an active, explanation-driven process. SAGE implements a rigorous methodology by systematically formulating multiple explanations for each feature, designing targeted experiments to test them, and iteratively refining explanations based on empirical activation feedback. Experiments on features from SAEs of diverse language models demonstrate that SAGE produces explanations with significantly higher generative and predictive accuracy compared to state-of-the-art baselines.an agent-based framework that recasts feature interpretation from a passive, single-pass generation task into an active, explanationdriven process. SAGE implements a rigorous methodology by systematically formulating multiple explanations for each feature, designing targeted experiments to test them, and iteratively refining explanations based on empirical activation feedback. Experiments on features from SAEs of diverse language models demonstrate that SAGE produces explanations with significantly higher generative and predictive accuracy compared to state-of-the-art baselines.
- Europe > Austria > Vienna (0.14)
- Asia > China (0.05)
- North America > United States > New Jersey (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
TrueCity: Real and Simulated Urban Data for Cross-Domain 3D Scene Understanding
Nguyen, Duc, Lai, Yan-Ling, Zhang, Qilin, Gyawali, Prabin, Schwab, Benedikt, Wysocki, Olaf, Kolbe, Thomas H.
3D semantic scene understanding remains a long-standing challenge in the 3D computer vision community. One of the key issues pertains to limited real-world annotated data to facilitate generalizable models. The common practice to tackle this issue is to simulate new data. Although synthetic datasets offer scalability and perfect labels, their designer-crafted scenes fail to capture real-world complexity and sensor noise, resulting in a synthetic-to-real domain gap. Moreover, no benchmark provides synchronized real and simulated point clouds for segmentation-oriented domain shift analysis. We introduce TrueCity, the first urban semantic segmentation benchmark with cm-accurate annotated real-world point clouds, semantic 3D city models, and annotated simulated point clouds representing the same city. TrueCity proposes segmentation classes aligned with international 3D city modeling standards, enabling consistent evaluation of synthetic-to-real gap. Our extensive experiments on common baselines quantify domain shift and highlight strategies for exploiting synthetic data to enhance real-world 3D scene understanding. We are convinced that the TrueCity dataset will foster further development of sim-to-real gap quantification and enable generalizable data-driven models. The data, code, and 3D models are available online: https://tum-gis.github.io/TrueCity/
- Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (15 more...)
- Information Technology (0.93)
- Transportation > Ground > Road (0.46)